Goto

Collaborating Authors

 minimal model




Non-Monotonic S4F Standpoint Logic (Extended Version with Proofs)

Gorczyca, Piotr, Strass, Hannes

arXiv.org Artificial Intelligence

Standpoint logics offer unified modal logic-based formalisms for representing multiple heterogeneous viewpoints. At the same time, many non-monotonic reasoning frameworks can be naturally captured using modal logics - in particular using the modal logic S4F. In this work, we propose a novel formalism called S4F Standpoint Logic, which generalises both S4F and propositional standpoint logic and is therefore capable of expressing multi-viewpoint, non-monotonic semantic commitments. We define its syntax and semantics and analyze its computational complexity, obtaining the result that S4F Standpoint Logic is not computationally harder than its constituent logics, whether in monotonic or non-monotonic form. We also outline mechanisms for credulous and sceptical acceptance and illustrate the framework with an example.



Riddled basin geometry sets fundamental limits to predictability and reproducibility in deep learning

Ly, Andrew, Gong, Pulin

arXiv.org Artificial Intelligence

Fundamental limits to predictability are central to our understanding of many physical and computational systems. Here we show that, despite its remarkable capabilities, deep learning exhibits such fundamental limits rooted in the fractal, riddled geometry of its basins of attraction: any initialization that leads to one solution lies arbitrarily close to another that leads to a different one. We derive sufficient conditions for the emergence of riddled basins by analytically linking features widely observed in deep learning, including chaotic learning dynamics and symmetry-induced invariant subspaces, to reveal a general route to riddling in realistic deep networks. The resulting basins of attraction possess an infinitely fine-scale fractal structure characterized by an uncertainty exponent near zero, so that even large increases in the precision of initial conditions yield only marginal gains in outcome predictability. Riddling thus imposes a fundamental limit on the predictability and hence reproducibility of neural network training, providing a unified account of many empirical observations. These results reveal a general organizing principle of deep learning with important implications for optimization and the safe deployment of artificial intelligence.



Minimal Model Reasoning in Description Logics: Don't Try This at Home!

Di Stefano, Federica, Manière, Quentin, Ortiz, Magdalena, Šimkus, Mantas

arXiv.org Artificial Intelligence

Reasoning with minimal models has always been at the core of many knowledge representation techniques, but we still have only a limited understanding of this problem in Description Logics (DLs). Minimization of some selected predicates, letting the remaining predicates vary or be fixed, as proposed in circumscription, has been explored and exhibits high complexity. The case of `pure' minimal models, where the extension of all predicates must be minimal, has remained largely uncharted. We address this problem in popular DLs and obtain surprisingly negative results: concept satisfiability in minimal models is undecidable already for $\mathcal{EL}$. This undecidability also extends to a very restricted fragment of tuple-generating dependencies. To regain decidability, we impose acyclicity conditions on the TBox that bring the worst-case complexity below double exponential time and allow us to establish a connection with the recently studied pointwise circumscription; we also derive results in data complexity. We conclude with a brief excursion to the DL-Lite family, where a positive result was known for DL-Lite$_{\text{core}}$, but our investigation establishes ExpSpace-hardness already for its extension DL-Lite$_{\text{horn}}$.


Forgetting in short and heterogeneous sequences of belief revisions

Liberatore, Paolo

arXiv.org Artificial Intelligence

Forgetting a specific belief revision episode may not erase information because the other revisions may provide or entail the same information. Whether it does was proved coNP-hard for sequences of two arbitrary lexicographic revisions or arbitrarily long lexicographic Horn revisions. A polynomial algorithm is presented for the case of two lexicographic Horn revision. Heterogeneous sequences, including revisions other than lexicographic, were proved to belong in Delta2. Their previously proved coNP-hardness is enhanced to Dp-hardness.


Differential learning kinetics govern the transition from memorization to generalization during in-context learning

Nguyen, Alex, Reddy, Gautam

arXiv.org Artificial Intelligence

Transformers exhibit in-context learning (ICL): the ability to use novel information presented in the context without additional weight updates. Recent work shows that ICL emerges when models are trained on a sufficiently diverse set of tasks and the transition from memorization to generalization is sharp with increasing task diversity. One interpretation is that a network's limited capacity to memorize favors generalization. Here, we examine the mechanistic underpinnings of this transition using a small transformer applied to a synthetic ICL task. Using theory and experiment, we show that the sub-circuits that memorize and generalize can be viewed as largely independent. The relative rates at which these sub-circuits learn explains the transition from memorization to generalization, rather than capacity constraints. We uncover a memorization scaling law, which determines the task diversity threshold at which the network generalizes. The theory quantitatively explains a variety of other ICL-related phenomena, including the long-tailed distribution of when ICL is acquired, the bimodal behavior of solutions close to the task diversity threshold, the influence of contextual and data distributional statistics on ICL, and the transient nature of ICL.


Disentangling Linear Mode-Connectivity

Altintas, Gul Sena, Bachmann, Gregor, Noci, Lorenzo, Hofmann, Thomas

arXiv.org Artificial Intelligence

Linear mode-connectivity (LMC) (or lack thereof) is one of the intriguing characteristics of neural network loss landscapes. While empirically well established, it unfortunately still lacks a proper theoretical understanding. Even worse, although empirical data points are abound, a systematic study of when networks exhibit LMC is largely missing in the literature. In this work we aim to close this gap. We explore how LMC is affected by three factors: (1) architecture (sparsity, weight-sharing), (2) training strategy (optimization setup) as well as (3) the underlying dataset. We place particular emphasis on minimal but non-trivial settings, removing as much unnecessary complexity as possible. We believe that our insights can guide future theoretical works on uncovering the inner workings of LMC.